Identifying Overlying Group of People through Clustering

نویسندگان

  • P. Manimaran
  • K. Duraiswamy
چکیده

Folksonomies like Delicious and LastFm are modeled as multilateral (user-resource-tag) hypergraphs for studying their network properties. Detecting communities of similar nodes from such networks is a challenging problem. Most existing algorithms for community detection in folksonomies assign unique communities to nodes, whereas in reality, users have multiple relevant interests and same resource is often tagged with semantically different tags. Few attempts to perceive overlapping communities work on forecasts of hypergraph, which results in momentous loss of information contained in original tripartite structure. Propose first algorithm to detect overlapping communities in folksonomies using complete hypergraph structure. The authors’ algorithm converts a hypergraph into its parallel line graph, using measures of hyperedge similarity, whereby any community detection algorithm on unipartite graphs can be used to produce intersecting communities in folksonomy. Through extensive experiments on synthetic as well as real folksonomy data, demonstrate that proposed algorithm can detect better community structures as compared to existing stateof-the-art algorithms for folksonomies. DOI: 10.4018/jitwe.2012100104 International Journal of Information Technology and Web Engineering, 7(4), 50-60, October-December 2012 51 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. cally impossible for a user to discover on her own, interesting resources and people having common interests. Hence it is important for personalized search (Xu, Bao, Fei, Su, & Yu, 2008) and recommendation of resources (Konstas, Stathopoulos, & Jose, 2009) and potential friends to the users. One approach to these tasks is to group various entities (resources, tags, users) into communities or clusters, which are typically thought of as groups of entities having more / better interactions among them than with entities outside Group. Folksonomies are modelled as tripartite hypergraphs having user, resource and tag nodes, where a hyperedge (u, t, r) indicates that user u has assigned tag t to resource r. Several algorithms have been proposed for detecting communities in hypergraphs, using techniques such as modularity maximization, identifying maximally connected sub-hypergraphs and so on. But, almost all of prior approaches do not consider an important aspect of problem they assign a single community to each node, whereas in reality, nodes in folksonomies frequently belong to multiple overlapping communities. For instance, users have multiple topics of interest and thus link to resources and tags of many different semantic categories. Similarly, same resource is frequently associated with semantically different tags by users who appreciate different aspects of resource. To the best of our knowledge, only two studies have addressed the problem of identifying overlapping communities in folksonomies. (i) Proposed an algorithm to detect overlapping communities of users in folksonomies considering only user-tag relationships (i.e. the user-tag bipartite projection of the hypergraph) (Wang, Tang, Gao, & Liu, 2010) and (ii) Detected overlapping tag communities by taking a projection of the hypergraph onto the set of tags (Papadopoulos, Kompatsiaris, & Vakali, 2010). Taking projections (as used by both these approaches) results in loss of some of the information contained in the original tripartite network and it is known that qualities of the communities obtained from projected networks are not as good as those obtained from the original network (Guimer`a, Sales-Pardo & Amaral, 2007). Also, none of these algorithms consider resource nodes in hypergraph. However, it is necessary to detect overlapping communities of users, resources and tags simultaneously for personalized recommendation of resources to users. Thus the goal of this paper is to propose such an algorithm that utilizes the complete tripartite structure to detect overlapping communities. Girvan and Newman (2002) proposed one of the initial algorithms for community detection. Their algorithm removes network edges iteratively based on their betweenness centrality, which results in splitting the network into disconnected components. In a successive work, they introduced the notion of modularity as a measure of the quality of community structure in a network (Girvan & Newman, 2004). A bunch of algorithms were proposed which attempt to detect community structure in a network by maximizing modularity score. For instance, Clauset, Newman and Moore (2004) proposed an agglomerative hierarchical clustering which successively joins pairs of communities (starting from single-node communities) such that each agglomeration results into the maximum possible modularity increase. Later, techniques like simulated annealing, extremal and spectral optimizations were presented to maximize modularity score. Refer to Santo Fortunato (2010) for a detailed survey of different community detection algorithms for graphs. In social networks, every individual typically belongs to more than one community. There are communities of her family members, friends and classmates, co-workers etc. Hence, a community detection algorithm should address the issue of overlapping communities. Recently many algorithms have been proposed which detect overlapping communities in graphs. Though a node in a network can be associated to multiple semantic topics, a link is usually associated with only one semantic (Ahn, Bagrow, & Lehmann, 2010) – for instance, a user can have multiple topical interests, but each link created by user is likely to be associated with exactly one of his interests. Discovering 9 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/identifying-overlying-grouppeople-through/75124?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Computer Science, Security, and Information Technology. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

گروه‌بندی همگن یادگیرندگان الکترونیکی بر اساس رفتار شبکه ای آنان

Automatic identification of learners groups based on similarity of learning style improves e-learning systems from the viewpoint of learning adaptation and collaboration among learners. In this paper, a new system is proposed for identifying groups of learners, who have similar learning style, by using learners’ behavior information in an e-learning environment. Proposed clustering method for s...

متن کامل

Identifying the main components of the hospital costs management process

Introduction: One of the most important goals of healthcare organizations is cost management in order to save money, provide quality services and achieve customer satisfaction, however, it poses serious challenges for managers. The current study aimed to identify the main components of cost management in hospitals based on a process-orientation approach. Methods: This qualitative research was ...

متن کامل

Uncertainty Modeling of a Group Tourism Recommendation System Based on Pearson Similarity Criteria, Bayesian Network and Self-Organizing Map Clustering Algorithm

Group tourism is one of the most important tasks in tourist recommender systems. These systems, despite of the potential contradictions among the group's tastes, seek to provide joint suggestions to all members of the group, and propose recommendations that would allow the satisfaction of a group of users rather than individual user satisfaction. Another issue that has received less attention i...

متن کامل

CUSTOMER CLUSTERING BASED ON FACTORS OF CUSTOMER LIFETIME VALUE WITH DATA MINING TECHNIQUE

Organizations have used Customer Lifetime Value (CLV) as an appropriate pattern to classify their customers. Data mining techniques have enabled organizations to analyze their customers’ behaviors more quantitatively. This research has been carried out to cluster customers based on factors of CLV model including length, recency, frequency, and monetary (LRFM) through data mining. Based on LRFM,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJITWE

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012